Learning a Probabilistic Model of Event Sequences from Internet Weblog Stories
نویسندگان
چکیده
One of the central problems in building broad-coverage story understanding systems is generating expectations about event sequences, i.e. predicting what happens next given some arbitrary narrative context. In this paper, we describe how a large corpus of stories extracted from Internet weblogs was used to learn a probabilistic model of event sequences using statistical language modeling techniques. Our approach was to encode weblog stories as sequences of events, one per sentence in the story, where each event was represented as a pair of descriptive key words extracted from the sentence. We then applied statistical language modeling techniques to each of the event sequences in the corpus. We evaluated the utility of the resulting model for the tasks of narrative event ordering and event prediction. Story Understanding Systems Automated story understanding has proved to be an extremely difficult task in natural language processing. Despite a rich history of research in automatic narrative comprehension (Mueller, 2002), no systems exist today that can automatically generate answers to questions about the events described in simple narratives of arbitrary content. Part of the difficulty is the amount of commonsense knowledge that is necessary to adequately interpret the meaning of narrative text or the questions asked of it. Accordingly, progress in this area has been made by limiting the story understanding task to specific domains and question types, allowing for the handauthoring of the relevant content theories needed to support interpretation. Mueller (2007) describes a state-of-the-art story understanding system that follows this approach, where a reasonably large number of questions about space and time can be answered with stories involving people eating in restaurants. By limiting the scope of the problem to restaurant stories, all of the relevant expectations about . Copyright © 2008 Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. the activity of going to a restaurant could be formalized and integrated into a natural language processing pipeline. Although promising, approaches like this suffer from problems of scalability. A significant amount of knowledge engineering effort is required to encode expectations about this one activity context, and others have demonstrated that many hundreds (or more) of these schemas are included in our commonsense understanding of everyday activities (Gordon, 2001). Accordingly, there is a strong need for new methods to acquire expectations about narrative events on a much larger scale. A promising alternative was explored by Singh & Barry (2003), where commonsense expectations about everyday activities were acquired from thousands of volunteers on the web as part of the Open Mind Commonsense Project. Contributors to this knowledge base (StoryNet) authored these expectations as sequences of natural language sentences. This allows for more scaleable knowledge engineering, but requires additional processing to transform natural language expressions into knowledge that can be manipulated in automated story comprehension applications. In many respects, the (fictional) stereotypical activity descriptions that are contributed by volunteers to this knowledge base are not so different from the real (nonfiction) stories that people write in their Internet weblogs, describing the experiences of their daily lives. Given the tens of millions of Internet weblogs in existence, we considered the question: Can the stories that people write in Internet weblogs be used to acquire expectations about everyday event sequences? In this paper, we describe an approach for acquiring a probabilistic model of event sequences through the largescale processing of stories in Internet weblogs. This approach begins with the processing of an existing corpus of stories automatically extracted from hundreds of thousands of Internet weblogs. Events within these stories are represented as a predicate-argument pair, one pair for each sentence, consisting of a main verb in the sentence and the head word of its patient argument. We then describe the novel application of existing language modeling technologies to create a probabilistic event 159 Proceedings of the Twenty-First International FLAIRS Conference (2008)
منابع مشابه
StoryUpgrade: Finding Stories in Internet Weblogs
The phenomenal rise of Internet weblogging has created new opportunities for people to tell personal stories of their life experience, and the potential to share these stories with those who can most benefit from reading them. One barrier to this new mode of storytelling is the lack of accessibility; existing Internet search tools are not tailored to the unique characteristics of this textual g...
متن کاملA Probabilistic Model of Learning Fields in Islamic Economics and Finance
In this paper an epistemological model of learning fields of probabilistic events is formalized. It is used to explain resource allocation governed by pervasive complementarities as the sign of unity of knowledge. Such an episteme is induced epistemologically into interacting, integrating and evolutionary variables representing the problem at hand. The end result is the formalization of a p...
متن کاملMining Commonsense Knowledge From Personal Stories in Internet Weblogs
Recent advances in automated knowledge base construction have created new opportunities to address one of the hardest challenges in Artificial Intelligence: automated commonsense reasoning. In this paper, we describe our recent efforts in mining commonsense knowledge from the personal stories that people write about their lives in their Internet weblogs. We summarize three preliminary investiga...
متن کاملOpen-domain Commonsense Reasoning Using Discourse Relations from a Corpus of Weblog Stories
We present a method of extracting opendomain commonsense knowledge by applying discourse parsing to a large corpus of personal stories written by Internet authors. We demonstrate the use of a linear-time, joint syntax/discourse dependency parser for this purpose, and we show how the extracted discourse relations can be used to generate opendomain textual inferences. Our evaluations of the disco...
متن کاملUsing Weblog to Promote Critical Thinking – An Exploratory Study
Weblog is an Internet tool that is believed to possess great potential to facilitate learning in education. This study wants to know if weblog can be used to promote students’ critical thinking. It used a group of secondary two students from a Singapore school to write weblogs as a means of substitution for their traditional handwritten assignments. The topics for the weblogging are taken from ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008